{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "stable_baselines_HER.ipynb",
      "provenance": [],
      "collapsed_sections": [],
      "include_colab_link": true
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    },
    "accelerator": "GPU"
  },
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "view-in-github",
        "colab_type": "text"
      },
      "source": [
        "<a href=\"https://colab.research.google.com/github/Stable-Baselines-Team/rl-colab-notebooks/blob/master/stable_baselines_her.ipynb\" target=\"_parent\"><img src=\"https://colab.research.google.com/assets/colab-badge.svg\" alt=\"Open In Colab\"/></a>"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "hyyN-2qyK_T2",
        "colab_type": "text"
      },
      "source": [
        "# Stable Baselines - Hindsight Experience Replay on Highway Env\n",
        "\n",
        "Github Repo: [https://github.com/hill-a/stable-baselines](https://github.com/hill-a/stable-baselines)\n",
        "\n",
        "Medium article: [https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82](https://medium.com/@araffin/stable-baselines-a-fork-of-openai-baselines-df87c4b2fc82)\n",
        "\n",
        "Highway env: [https://github.com/eleurent/highway-env](https://github.com/eleurent/highway-env) \n",
        "\n",
        "[RL Baselines Zoo](https://github.com/araffin/rl-baselines-zoo) is a collection of pre-trained Reinforcement Learning agents using Stable-Baselines.\n",
        "\n",
        "It also provides basic scripts for training, evaluating agents, tuning hyperparameters and recording videos.\n",
        "\n",
        "Documentation is available online: [https://stable-baselines.readthedocs.io/](https://stable-baselines.readthedocs.io/)\n",
        "\n",
        "## Install Dependencies and Stable Baselines Using Pip\n",
        "\n",
        "List of full dependencies can be found in the [README](https://github.com/hill-a/stable-baselines).\n",
        "\n",
        "```\n",
        "sudo apt-get update && sudo apt-get install cmake libopenmpi-dev zlib1g-dev\n",
        "```\n",
        "\n",
        "\n",
        "```\n",
        "pip install stable-baselines[mpi]\n",
        "```"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "qPg7pyvK_Emi",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Stable Baselines only supports tensorflow 1.x for now\n",
        "%tensorflow_version 1.x\n",
        "# Install stable-baselines latest version\n",
        "!pip install stable-baselines[mpi]==2.10.2"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "4wA6lU254uNl",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Install highway-env\n",
        "!pip install git+https://github.com/eleurent/highway-env#egg=highway-env"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "FtY8FhliLsGm",
        "colab_type": "text"
      },
      "source": [
        "## Import policy, RL agent, ..."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "BIedd7Pz9sOs",
        "colab_type": "code",
        "outputId": "835096d4-cec8-44e9-f1e4-39901b34009d",
        "colab": {
          "base_uri": "https://localhost:8080/",
          "height": 53
        }
      },
      "source": [
        "import gym\n",
        "import highway_env\n",
        "import numpy as np\n",
        "\n",
        "from stable_baselines import HER, SAC, DDPG\n",
        "from stable_baselines.ddpg import NormalActionNoise"
      ],
      "execution_count": 0,
      "outputs": [
        {
          "output_type": "stream",
          "text": [
            "pygame 1.9.6\n",
            "Hello from the pygame community. https://www.pygame.org/contribute.html\n"
          ],
          "name": "stdout"
        }
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RapkYvTXL7Cd",
        "colab_type": "text"
      },
      "source": [
        "## Create the Gym env and instantiate the agent\n",
        "\n",
        "For this example, we will be using the parking environment from the [highway-env](https://github.com/eleurent/highway-env) repo by @eleurent.\n",
        "\n",
        "The parking env is a goal-conditioned continuous control task, in which the vehicle must park in a given space with the appropriate heading.\n",
        "\n",
        "\n",
        "![parking-env](https://raw.githubusercontent.com/eleurent/highway-env/gh-media/docs/media/parking-env.gif)\n",
        "\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "QT5Io2LwAzlp",
        "colab_type": "text"
      },
      "source": [
        "### Train Soft Actor-Critic (SAC) agent\n",
        "\n",
        "Here, we use HER \"future\" goal sampling strategy, where we create 4 artificial transitions per real transition\n",
        "\n",
        "Note: the hyperparameters (network architecture, discount factor, ...) where tuned for this task"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "TzkK23C2BCKr",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "env = gym.make(\"parking-v0\")"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "QDdPRp0f5Bcz",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# SAC hyperparams:\n",
        "model = HER('MlpPolicy', env, SAC, n_sampled_goal=4,\n",
        "            goal_selection_strategy='future',\n",
        "            verbose=1, buffer_size=int(1e6),\n",
        "            learning_rate=1e-3,\n",
        "            gamma=0.95, batch_size=256,\n",
        "            policy_kwargs=dict(layers=[256, 256, 256]))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "P82f568g5g0q",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Train for 1e5 steps\n",
        "model.learn(int(1e5))\n",
        "# Save the trained agent\n",
        "model.save('her_sac_highway')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "nGa3dNn36PTX",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Load saved model\n",
        "model = HER.load('her_sac_highway', env=env)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "Jg_7vKNGA6Hf",
        "colab_type": "text"
      },
      "source": [
        "#### Evaluate the agent"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "D3sxhf1Q6NlL",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "obs = env.reset()\n",
        "\n",
        "# Evaluate the agent\n",
        "episode_reward = 0\n",
        "for _ in range(1000):\n",
        "\taction, _ = model.predict(obs)\n",
        "\tobs, reward, done, info = env.step(action)\n",
        "\tepisode_reward += reward\n",
        "\tif done or info.get('is_success', False):\n",
        "\t\tprint(\"Reward:\", episode_reward, \"Success?\", info.get('is_success', False))\n",
        "\t\tepisode_reward = 0.0\n",
        "\t\tobs = env.reset()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "fMjiWlSEi-3n",
        "colab_type": "text"
      },
      "source": [
        "### Train DDPG agent"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "FCLW5cGIZa52",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create the action noise object that will be used for exploration\n",
        "n_actions = env.action_space.shape[0]\n",
        "noise_std = 0.2\n",
        "action_noise = NormalActionNoise(mean=np.zeros(n_actions), sigma=noise_std * np.ones(n_actions))\n",
        "\n",
        "model = HER('MlpPolicy', env, DDPG, n_sampled_goal=4,\n",
        "            goal_selection_strategy='future',\n",
        "            verbose=1, buffer_size=int(1e6),\n",
        "            actor_lr=1e-3, critic_lr=1e-3, action_noise=action_noise,\n",
        "            gamma=0.95, batch_size=256,\n",
        "            policy_kwargs=dict(layers=[256, 256, 256]))"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "PXAQeENljBIe",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "model.learn(int(2e5))\n",
        "\n",
        "model.save('her_ddpg_highway')"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "B8uJjxQLBxdS",
        "colab_type": "text"
      },
      "source": [
        "#### Evaluate the agent"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "XjohypRhjEjS",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "obs = env.reset()\n",
        "\n",
        "# Evaluate the agent\n",
        "episode_reward = 0\n",
        "for _ in range(1000):\n",
        "\taction, _ = model.predict(obs)\n",
        "\tobs, reward, done, info = env.step(action)\n",
        "\tepisode_reward += reward\n",
        "\tif done or info.get('is_success', False):\n",
        "\t\tprint(\"Reward:\", episode_reward, \"Success?\", info.get('is_success', False))\n",
        "\t\tepisode_reward = 0.0\n",
        "\t\tobs = env.reset()"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "Av5BQaUeXOEZ",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        ""
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}
